Improving Mention Detection Robustness to Noisy Input
نویسندگان
چکیده
Information-extraction (IE) research typically focuses on clean-text inputs. However, an IE engine serving real applications yields many false alarms due to less-well-formed input. For example, IE in a multilingual broadcast processing system has to deal with inaccurate automatic transcription and translation. The resulting presence of non-target-language text in this case, and non-language material interspersed in data from other applications, raise the research problem of making IE robust to such noisy input text. We address one such IE task: entity-mention detection. We describe augmenting a statistical mention-detection system in order to reduce false alarms from spurious passages. The diverse nature of input noise leads us to pursue a multi-faceted approach to robustness. For our English-language system, at various miss rates we eliminate 97% of false alarms on inputs from other Latin-alphabet languages. In another experiment, representing scenarios in which genre-specific training is infeasible, we process real financial-transactions text containing mixed languages and data-set codes. On these data, because we do not train on data like it, we achieve a smaller but significant improvement. These gains come with virtually no loss in accuracy on clean English text.
منابع مشابه
On the Detection of Unknown Input in Positional Control Problems with Noisy Measurements
The paper examines the conditions for isolating the unknown input detection from the effects of the measurements noise in the important family of positional control problems. The study is motivated by the need of improving the homing performance of interceptor missiles against randomly maneuvering targets. The required isolation is possible if, in addition to noisy relative position measurement...
متن کاملDetecting Effectiveness of Outliers and Noisy Data on Fuzzy System Using FCM
Fuzzy systems which are an artificial intelligent technique are applicable for controlling and decision support systems. Fuzzy systems are created using membership functions (MFs) which modeled based on dataset. Therefore, there is relation between uncertainty of input data and fuzziness expressed by MFs. Outliers and noisy data are kinds of uncertainty which affect on membership function. Thus...
متن کاملTowards improving speech detection robustness for speech recognition in adverse conditions
Recognition performance decreases when recognition systems are used over the telephone network, especially wireless network and noisy environments. It appears that non-efficient speech/non-speech detection (SND) is an important source of this degradation. Therefore, speech detection robustness to noise is a challenging problem to be examined, in order to improve recognition performance for the ...
متن کاملNoisy images edge detection: Ant colony optimization algorithm
The edges of an image define the image boundary. When the image is noisy, it does not become easy to identify the edges. Therefore, a method requests to be developed that can identify edges clearly in a noisy image. Many methods have been proposed earlier using filters, transforms and wavelets with Ant colony optimization (ACO) that detect edges. We here used ACO for edge detection of noisy ima...
متن کاملA Noisy Channel Approach to Error Correction in Spoken Referring Expressions
We offer a noisy channel approach for recognizing and correcting erroneous words in referring expressions. Our mechanism handles three types of errors: it removes noisy input, inserts missing prepositions, and replaces mis-heard words (at present, they are replaced by generic words). Our mechanism was evaluated on a corpus of 295 spoken referring expressions, improving interpretation performance.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010